Two Languages - One Annotation Scenario? Experience from the Prague Dependency Treebank

نویسندگان

  • Silvie Cinková
  • Eva Hajicová
  • Jarmila Panevová
  • Petr Sgall
چکیده

is paper compares the two FGD-based annotation scenarios for Czech and for English, with the Czech as the basis. We discuss the secondary predication expressed by infinitive and its functions in Czech and English, respectively. We give a few examples of English constructions that do not have direct counterparts in Czech (e.g., tough movement and causative constructions with make, get, and have), as well as some phenomena central in English but much less employed in Czech (object raising or control in adjectives as nominal predicates), and, last, structures more or less parallel both in their function and distribution, whose respective annotation differs due to significant differences in the respective linguistic traditions (verbs of perception). 1. Introductory Remarks 1.1. e current tasks of corpus linguistics e expansion of the use of computers for linguistic studies based on very large empirical language material led to the appearance of an allegedly new domain, corpus linguistics. One can then ask what the position of corpus linguistics is with regard to computational linguistics. And also what its relation to “real” linguistics is. It is no doubt that the intersection of the two former domains is very large and also that there is no reason to distinguish between corpus and “real” linguistics. ere is no descriptive framework universally accepted since there is a diversity of many different trends in linguistics. A discussion on theoretical characterization of linguistic phenomena and the computerized checking of the adequacy of descriptive frameworks belong to fundamental goals in linguistics, and a highly effective collaboration of researchers in all the relevant fields is needed. is implies also the necessity of a systematic, intrinsic collaboration (if not a symbiosis) of corpus oriented and computational linguistics with linguistic theory. © 2008 PBML. All rights reserved. Please cite this article as: Silvie Cinková, Eva Hajičová, Jarmila Panevová, Petr Sgall, Two Languages – One Annotation Scenario? Experience from the Prague Dependency Treebank. The Prague Bulletin of Mathematical Linguistics No. 89, 2008, 5–22.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Tectogrammatics of English: on Some Problematic Issues from the Viewpoint of the Prague Dependency Treebank

The present paper is aimed to illustrate how the description of underlying structures carried out in annotating Czech texts may be used as a basis for comparison with a more or less parallel description of English. Specific attention is given to several points in which there are differences between the two languages that concern not only their surface or outer form, but (possibly) also their un...

متن کامل

From Sentence to Discourse: Building an Annotation Scheme for Discourse Based on Prague Dependency Treebank

The present paper reports on a preparatory research for building a language corpus annotation scenario capturing the discourse relations in Czech. We primarily focus on the description of the syntactically motivated relations in discourse, basing our findings on the theoretical background of the Prague Dependency Treebank 2.0 and the Penn Discourse Treebank 2. Our aim is to revisit the present-...

متن کامل

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

Complex Corpus Annotation: The Prague Dependency Treebank

The Prague Dependency Treebank (Hajič et al., 2001) is approaching the publication of its second version in which the tectogrammatical annotation is being added to the morphological and analytical (surface-syntactic) one. In this article, the Prague Dependency Treebank as a whole is being described, including its brief history. In this volume, there are three more papers with a detailed account...

متن کامل

Prague Dependency Style Treebank for Tamil

Annotated corpora such as treebanks are important for the development of parsers, language applications as well as understanding of the language itself. Only very few languages possess these scarce resources. In this paper, we describe our efforts in syntactically annotating a small corpora (600 sentences) of Tamil language. Our annotation is similar to Prague Dependency Treebank (PDT) and cons...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Prague Bull. Math. Linguistics

دوره 89  شماره 

صفحات  -

تاریخ انتشار 2008